Corpus-Based Induction of Lexical Representation and Meaning
نویسنده
چکیده
The acquisition of linguistic knowledge, i.e., the identication, extraction, and encoding of linguistic information in a corpus, has been one of the main motivations for data-driven approaches to natural language. Methods have been developed for the acquisition of, for instance, parts of speech, noun compounds, collocations, support verbs, subcategorization frames, phrase structure rules, selectional restrictions and sense induction (cf. Armstrong (1993) for an overview). Drawing on this body of research, I am investigating the acquisition of lexical semantic knowledge from corpora, thereby addressing the logical problem of language acquisition, one of the fundamental issues in linguistics and cognitive science. My guiding assumption is that syntactic as well as semantic representations are projected from information in the lexicon, and that a crucial part of the relevant lexical information is the result of language experience, and hence can be induced from corpora. The proposed research includes three main subtasks: (a) induction of di erent types of (\low-level") lexical semantic information (i.e., subcategorization frames, selectional restrictions, semantic classes), using established corpus-based methods; (b) combination of the induced types of lexical semantic information into (\highlevel") semantic representations, based on existing theories of the lexicon; (c) evaluation of the resulting model against human intuitions. By applying corpus-based techniques to lexical semantics, i.e., a classical representational problem in linguistics, I hope to contribute to bridging the gap between current data-driven approaches to language and the knowledge-driven methods of traditional linguistics.
منابع مشابه
A Corpus-Based Study of the Lexical Make-up of Applied Linguistics Article Abstracts
This paper reports results from a corpus-based study that explored the frequency of words in the abstracts of applied linguistics journal articles. The abstracts of major articles in leading applied linguists journals, published since 2005 up to November 2001 were analyzed using software modules from the Compleat Lexical Tutor. The output includes a list of the most frequent content words, list...
متن کاملA Corpus-based Study of Lexical Bundles in Discussion Section of Medical Research Articles
There has been increasing interest in utilizing corpora in linguistic research and pedagogy in recent years. Rhetorical organization of different sections of research articles may appear similar in various disciplines, but close examination may show subtle differences nonetheless. One of the features that has been at the center of attention especially in recent years is the idiomaticity of a di...
متن کاملThe Impact of Teaching Corpus-based Collocation on EFL Learners' Writing Ability
Abstract The present study explores the impact of corpus-based collocation instruction on intermediate Iranian EFL learners' writing ability. For this study, 84 Iranian learners, studying English as a foreign language in Bayan Institute, Iran, were selected and were randomly divided into two groups, experimental and control. Conventional methods of writing instruction were taught to the control...
متن کاملThe Impact of Teaching Corpus-based Collocation on EFL Learners' Writing Ability
Abstract The present study explores the impact of corpus-based collocation instruction on intermediate Iranian EFL learners' writing ability. For this study, 84 Iranian learners, studying English as a foreign language in Bayan Institute, Iran, were selected and were randomly divided into two groups, experimental and control. Conventional methods of writing instruction were taught to the control...
متن کاملDeveloping a Corpus-Based Word List in Pharmacy Research Articles: A Focus on Academic Culture
The present corpus-based lexical study reports the development of a Pharmacy Academic Word List (PAWL); a list of the most frequent words from a corpus of 3,458,445 tokens made up of 800 most recent pharmacy texts including research articles, review articles, and short communications in four sub-disciplines of pharmacy. WordSmith (Scott, 2017) and AntWordProfiler (Anthony, 2014) were used to sc...
متن کامل